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Developing an Effective Faculty Evaluation System 

William E. Cashin 
Kansas State University 

Those striving for perfection in these [faculty evaluation] systems may be on a collision 
course with disappointment. Or they may have a more subtle, Machiavellian motive, calling 
for a degree of perfection that they Imow can never be achieved in order to sabotage the 
whole effort. 

(Miller, 1987, pp. 26-27) 



Since the early 1970s a substantial literature has devel- 
oped about faculty evaluation. Two excellent books have 
been published in the last two years. The first published 
was Reflective Faculty Evaluation: Enhancing Teaching 
and Determining Faculty Effectiveness by Centra (1993); 
it was an extensive updating of his Determining Faculty 
Effectiveness (Centra, 1979). The second was Assessing 
Faculty Work: Enhancing Individual and Institutional 
Performance by Braskamp and Ory (1994) which repre- 
sented a significant expansion of their earlier book. 
Evaluating Teaching Effectiveness (Braskamp, 
Brandenburg, & Ory, 1984) that only dealt with evaluating 
teaching. 

Other major contributions to the literature that require 
mention were written by Miller and by Seldin. Miller's 
1987 Evaluating Faculty for Promotion and Tenure was 
proceeded by two books written by him in the early 1970s, 
Evaluating Faculty Performance (1972) and Developing 
Programs for Faculty Evaluation (1974). Seldin’s Suc- 
cessful Faculty Evaluation Programs (1980) was followed 
by Changing Practices in Faculty Evaluaiion (1984) and 
Evaluating and Developing Administrative Performance 
(1988). Each of these books cites many other books and 
articles on faculty evaluation. 

The point of this flurry of citations is this, as one reads the 
different authors, one is stmck by the high degree of 
agreement among them. I would suggest that among 
those knowledgeable of the literature and experienced in 
the field, there is 80 to 90 percent agreement about the 
general principles that should guide effective faculty 
evaluation. The answers to the important questions are 
known, although not necessarily on every campus. 

The higher education rhetoric is almost universal in 
stating that the primary purpose of faculty evaluation is to 
help faculty improve their performance. However, an 
examination of the systems— as used— indicates that the 
primary purpose is almost always to make personnel 
decisions. That is, to make decisions for retention, 
promotion, tenure, and salary increases (summative 
evaluation). Summative evaluation is both legitimate and 



necessary, and can serve to improve the institution. 
However, it does not necessarily help the individual 
faculty member improve (formative evaluation or develop- 
ment). 

Because every college and university makes personnel 
decisions, that is the primary focus of this paper. I will 
suggest 20 principles or steps in an effective faculty 
evaluation system that are repeatedly recommended in 
the literature. 

1 . The institution— and the units within the Institu- 
tion — must develop c/ear goals. Without criteria, 
evaluation is impossible. But criteria require a context. 

The basic context for evaluating faculty is the mission or 
goals of the institution. These inform the goals of the 
subunits— e.g., colleges within a university— all the way 
down to the foundation units, the department or division. 
This does not mean that there should be a single — 
monolithic— set of goals. Colleges vary, academic fields 
vary, departments vary, faculty vary, but the general 
context should be the mission and goals of the whole 
institution. (See Braskamp and Ory, 1994; and Diamond 
and Adam, 1993, for elaborations.) 

Unfortunately at most institutions the goals are implicit. 
Even more unfortunately, one often finds significant 
disagreement across various departments and their 
faculty about what the institution should be doing. For 
example, at many research universities not all depart- 
ments offer the doctorate; some may only offer the 
bachelor's, especially if there are satellite campuses. 
However, it is not unusual for the criteria for promotion to 
be constructed as though every faculty member was in a 
doctorate-granting department. One interpretation of such 
an approach is that this university should nof be offering 
undergraduate courses. If that were realty true, then the 
institution should stop offering undergraduate courses. 
However, it is rarely true. Therefore, everyone in the 
institution needs to be clear that part of the Institution's 
mission is to educate undergraduates, and quality under- 
graduate teaching must be recognized by the criteria and 
rewarded. 



2. Decide on the purpose(s) data will be used for 
before any data are collectid. Every institution makes 
personnel decisions. Even if there is no rank (therefore 
no promotion), and no tenure, and across the board 
raises (therefore no merit pay, i.e. individual decisions 
about salary increases), at least the institution needs to 
decide about retention. So one purpose of collecting 
data about faculty is always summative evaluation — 
making personnel decisions. 

However, if we go to the trouble to collect data— hopefully 
accurate data— then why not also use (some of) it for 
facility improvement (formative evaluation). We say that 
we are primarily interested in improvement. But ask 
yourself, if faculty decided today that they wanted to 
improve their performance, what kind of systematic help 
is available from your Institutlort? Most often it is to 
talk with the department head. Department heads can be 
very helpful, but on most campuses they also make 
per^nn&l decisions — a potentially serious conflict in 
rolee. ideally there should be one or more instructional 
consultants or master teachers on campus who are 
avait^le to the faculty and have zero input into personnei 
decisions. Relatively few campuses offer such support. 
Sorfie campuses provide, or offer, mentors for some 
faculty. Sometimes workshops are offered on selected 
topics. Simply talking with a colleague can be very 
helpful, but if you need substantive (time-consuming) 
help, you are really imposing on your colleague because 
such help is rarely taken into consideration by faculty 
evaluation systems unless it is an assigned responsibility. 
Similarly, faculty development committees— that have no 
release time — do not represent significant institutional 
contribution to faculty development (although they may 
represent a significant persona/ contribution on the part of 
the 'committee members). 

Faculty evaluation data are rarely used for the purpose of 
student advisement, i.e., to provide data to help students 
choose instructors or courses, aithough this is a perfectly 
legitimate use of the data. Faculty evaluation data can 
also be used for institutional development, and for 
research, both legitimate but infrequent uses of the data. 

The primary reason the purposes or uses of the data must 
be decided upon before any data are collected is justice; 
it is unfair to collect data without everyone knowing who 
will receive what information for what purposes, because 
such knowledge can influence the responses of students 
and of others. When the rules of the game are known 
beforehand, the system is more likely to be accepted by 
the faculty, and also be more defensible if it goes to court 
(assuming that the rules have been followed). 

3. Use pilot programs when appropriate. This is not 
discuss^ much in the faculty evaluation literature, but is 
emphasized in the educational change literature at least 
as far back as the 1970s (Lindquist, 1978). Say you are 
considering introducing a significant change into your 
evaluation system, e.g., using teaching portfolios for 
evaluation or peers rating course materials. Organiza- 
tions have a tendency to want to invent everything at 
home. Even if it were possible, it would be grossly 
inefficient. You would be well advised to contact several 



institutions similar to your own and find out how they do it. 
However, there comes a time when you need your own 
experience, but no/ with everyone. 

Decide on the first approximation of your proposed 
program, then obtain volunteers. The volunteers should 
be representative of the groups that will eventually 
participate if the program is adopted. Let the volunteers 
make suggestions about the program, then run the pilot. 
After the pilot, those running it and the volunteers should 
discuss the experience. You may have enough informa- 
tion to decide you do no/ want to make the procedure part 
of your evaluation system. Quite often you will decide 
that you need to make revisions and run a second pilot. 
Occasionally you will decide that with minor revisions the 
procedure could be adopted. In that case you can 
propose the procedure to the entire group of potential 
participants for discussion. One very important point, the 
data collected during the pllot(s) should not be used 
for evaluation. That means that during the pilot(s) you 
will need to collect double data on the volunteers. 

4. Significantly \nvo\ve participants— especially 
campus leaders — in the development of the system. 
The primary reason for this involvement is acceptance 
and ownership. Involving the leaders among the faculty 
helps to make the evaluation system the faculty's system, 
not just the administration’s. To the extent possible, 
involve all of the faculty and other constituencies, e.g., 
students, trustees. If you want human beings to actively 
and constructively implement a system, give them a 
significant say in its development. Doing so is also likely 
to lead to receiving some useful suggestions. (See 
Fanner, 1990, for some elaborations.) 

5. Foster extensive, open communication before, 
during, and after the adoption of the system. 

For some task-oriented (vs. people-oriented) administra- 
tors— and faculty— spending all that time talking about 
what you should do seems a te Tible waste of time. If you 
have a good idea, do it. But faculty evaluation is far more 
than a cognitive process; it is an affective one. It is about 
changing attitudes, values, traditions, and their 
attendant emotions. Any change in your faculty evalua- 
tion system will require what Bennis, et al. (1976) called a 
normative*reeducative strategy. You not only have to 
change ideas, you have to change feelings. And discus- 
sion helps change feelings. An empirical-rational strat- 
egy— simply having a good idea — is not enough. A 
power-coercive strategy— trying to force the faculty to 
accept a position — is positively counterproductive. 
“Wasting time” in talking out the proposed changes may 
be one of the most productive things you can dol 

6. Obtain support for the development of the system 
from high-level administrators. Leadership from the 
bottom is notoriously inefficient, and usually ineffective. If 
the top-level administrator(s) do(es) not support a pro- 
posed improvement, no matter how excellent the change 
may be, forget it. You will not be able to make a substan- 
tive improvement. 
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7 . Ensuro that the system is flexible. This is extremely 
important. Any system of faculty evaluation needs to be 
concerned about fairness, which often translates Into a 
concern about comparability. The most obvious solution 
to the comparability problem is to use the same system 
for everyone. Using the same evaluation system for 
everyone almost guarantees that it will be unfair to 
everyone. Therefore, each department/division/academic 
unit should have documents that describe and give 
exarnples of how the institution’s evaluation system 
applies to the characteristics and circumstances of that 
unit and its faculty. 

Not all departments have the same mission in the institu- 
tion, e.g., they may not offer the same degrees. So the 
weight given to research can vary. Even if departments 
offer the same degrees, fields differ. In some fields 
research usually means a book, in others a journal article, 
in still others a creative work. In some fields publication in 
some refereed journals counts for more than in other 
refereed journals. Acceptable teaching loads vary. How 
does a lecture compare with a lab or with giving individual 
music instruction? Each department needs to spell this 
out so when people from other fields are evaluating 
someone for promotion or tenure, they have some 
understanding of what is applicable in that field and not 
use the criteria from their own field. 

The development of such documents usually takes a few 
years, and multiple iterations, because after the depart- 
ment has developed a first approximation of their criteria, 
the unit above — ^the college or university — must react. 
Even after every level seems to have approved the 
department’s system in the abstract, when a real case 
comes up, disagreements are often discovered, if at all 
possible, I suggest that after agreement in the abstract is 
reached, some case studies be evaluated by representa- 
tives of the different levels to see what disagreements or 
differences in interpretation still remain. 

8. Ensure that the system is legal. This is a complex 
topic that I will not even attempt to address, other than to 
say, consult with your institution's attorney. And probably 
consult with another attorney because attorneys do not 
always agree. Centra (1993) has a chapter on “Legal 
Considerations in Faculty Evaluation." Braskamp and Ory 
(1994) have some pages on "Legal Principles." For a 
reference on general legal questions, see Kaplin and 
Lee's (1 995) The Law of Higher Education. 

9. Define ma/or faculty responsibilities at the begin- 
ning of the evaluation period. The traditional faculty 
resjsonsibilities are teaching, research, and service. I 
suggest that we add advising (of studer.ts nof in one’s 
courses) as a separate responsibility because it is 
important and should be rewarded. Advising is important 
because when done effectively it can significantly en- 
hance the student's educational experience; it also helps 
to retain .students. 

Service deserves much greater weight than it typically 
receives. Effective committee work related to a 
department’s introductory course(s), or to the program for 
majors, can significantly enhance the effectiveness of the 



instructional program. In some fields, e.g., education and 
nursing, where there is significant supervision of students 
off campus, service responsibilities can become a major 
portion of a faculty member’s load. 

In some fields there may be a unique area of responsibil- 
ity. How many fields would consider international activi- 
ties important enough to be a separate category? (Not 
many.) However, in many departments of agricultural 
economics there is enough consulting overseas that it 
deserves to be a separate responsibility. 

There are two other areas that are not treated in much 
detail in the literature, but which deserve greater consider- 
ation: professional competence and professional behav- 
ior. i suggest that these should form the foundation for all 
of the faculty responsibilities discussed above. They 
have always been included implicitly. AAUP (1990) has 
listed subject matter mastery and moral turpitude for 
decades. Professional competence not only includes 
degrees earned, but in some fields licenses (e.g., 
nursing) or certificates (e.g., the CPA in accounting). In 
almost every field, previous experience and special 
training impact competence (e.g., post-doctoral fellow- 
ships). Usually much of this information is available in the 
faculty member’s personnel file. 

Professional behavior is beginning to receive more 
explicit consideration (e.g.. Dill, 1982; Wilcox & Ebbs, 
199rv Professional behavior would include things like 
ethical behavior related to teaching and research 
(AAUP, 1990; APA, 1992; Braxton, 1994; CAS, 1988; 
Svinicki, 1994; Tabachnick, et al.„ 1992). Other relevant 
areas are non-sexist/nor?-racist behavior (Riggs, et al., 
1993), non-substance abuse, and legal behavior (e.g., 
is conviction of a felony grounds for dismissal at your 
institution?). Another area of concern is collegiality. 
Especially in small departments, an uncooperative, 
abrasive colleague can have a significant negative impact 
on the department’s effectiveness. The question is not 
whether it is reasonable to consider collegiality, but how 
to measure it in an accurate and unbiased way. Simply 
asking every faculty member to rate every other faculty 
member of collegiality is nof sufficient. The School of 
Agriculture at Tennessee Technological University has 
made a useful beginning. Their "Tenure-Track Review 
Ballot" lists specific collegial behaviors that are to be 
rated. Other possibly relevant behaviors relate to commit- 
ment to the values of the institution (e.g., at church- 
related institutions), relationship to authority, and Interper- 
sonal relationships (e.g., romantic relationships). 

A serious reservation about making professional behavior 
a regular part of an institution’s faculty evaluation proce- 
dures is that in practice only negative behaviors would 
likely be used. It would be difficult, for example, for a faculty 
member to demonstrate that he or she was especially ethical 
in the classroom. However, including a general discussion 
on the expectation of professional behavior in the faculty 
handbook may be worthwhile just to make it explicit. Some 
faculty from some cultures may honestly have different 
concepts of what is acceptable behavior. 
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10. Define faculty suOresponsibilitles at the beginning 
of the evaluation period and determine their weight* 
ing. Simply to list teaching or research is not sufficient. 

In IDEA Paper No. 21 (Cashin, 1989) I suggested that 
there were seven aspects to teaching: subject matter 
mastery, curriculum development, course design, delivery 
of instruction, assessment of learning, availability to 
students, and administrative requirements. (Students 
know little or nothing about some of these.) Regarding 
research, Sundre (1992) lists 249 possible attributes of 
scholarship. Pellino, et al. (1984) identified six dimen- 
sions of scholarship: professional activity, research 
(publishing), teaching, service, artistic endeavor, and 
“engagement with the novel.” Service and advising may 
involve more than what is typically found on the 
institution's “Annual Faculty Activities Form.” Given the 
wide variation among academic fields and different 
departments, each unit must detail what is considered to 
be teaching, etc. when evaluating their faculty. 

Not only should the subresponsibilities be defined qualita- 
tively, but it is also highly desirable to decide on their 
weighting. First decide whether teaching, research, etc. 
should all count equally. (Usually not.) Then, for ex- 
ample, will delivery of instruction be weighted as much as, 
say, administrative requirements? (Rarely.) Will you use 
the same weightings for all faculty? (I hope not.) A 
typical approach is for the department to decide a range 
for each responsibility, e.g., service can vary from 10% to 
40%. Then individual faculty members negotiate their 
effort within the department’s guidelines. It is the 
chairperson's responsibility to insure that the total mix 
agreed upon covers all of the department’s goals. Thus, 
typically two or more meetings with faculty are required. 
Arreola (1995) details a weighting system using examples 
from teaching. Tucker (1984) describes a point system 
for faculty evaluation that implies a weighting of impor- 
tance for different activities. 

1 1 . Define the sources of data to be used to evaluate 
each subresponsibility at the beginning of the evalua* 
tion period. This is not as straightforward as it might 
seem. It is not enough to decide that students are going 
to be one source of data used to evaluate teaching. You 
must decide whether you are going to use student ratings 
and/or student’s comments to open-ended questions, or 
interview data from a small-group instructional diagnosis 
conducted during the course, or interview data from 
graduating majors, or solicited or unsolicited letters from 
students, or the complaints of students, etc. Given that 
you are going to use student ratings, do you need ratings 
from alt of the classes taught or only a sample? Will you 
use the responses to all of the items or only selected 
ones? Will you use ratings if only half the students enrolled 
completed them? These decisions should be made before 
any data are collected and all of the faculty involved should 
have had the opportunity to provide feedback. 

12. Use mu/f/p/e sources of data. Because alt of the 
data are imperfect, and usually statistically unreliable, 
many sources of data must be used for an accurate 
evaluation, not just department head’s data (impressions). 
I would suggest that there is no such thing as “objective” 
data to be used for evaluation. All of the data involve 
someone’s opinion or someone’s judgment: the students, 



colleagues, administrators. However, this does not mean 
that these opinions cannot be inform^ opinions. Even 
something like grant dollars that one might think were 
certainly objective involve someone’s judgment that it is 
appropriate to count dollars because grant funds are not 
equally available in all fields. 

As a corollary, I would strongly recommend that depart- 
ments initially make a tentative decision, i.e., based on 
the present data this is what is recommended. This 
tentative decision and its basis should be communicated 
to the faculty member so that he or she could correct 
mistakes or add relevant information. Adapting such a 
procedure can make the data more reliable and valid, and 
the decisions more acceptable. 

1 3. Ensure that the data/measures are technicaiiy 
acceptable, he., are reliable and valid. Although each 
kind of data or measure, taken separately, may be 
unreliable, when the combined data from several different 
sources agree, one has statistically reliable data. When 
they do not agree, if at all possible, obtain more data. 

Since there is no agreed upon definition of effective 
teaching, or of effective research, or service, or advising, 
it is impossible to prove the validity of any of our mea- 
sures except their face validity. That is, the data appear 
to be consistent with, for example, effective teaching. 

Only a few studies have attempted to research the validity 
of multiple sources of data (e.g.. Marsh, 1982). 

14. Spec///ca//y define the criteria and the standards 
for each subresponsibiiity. Typically faculty handbooks 
will talk about teaching, research, and service as faculty 
responsibilities, and then state the supposed criteria, for 
promotion “excellence" is required in two areas, and 
“quality” in the third. But what is the standard, what kind 
of student ratings, for example, does one need to be 
considered an excellent teacher? If I teach four courses 
and the student rating form uses a 5-point scale (so a “3” 
might be considered a "C” — acceptable), and if I have C’s 
in two courses, and an A in the third, is it acceptable for 
me to have an F in the fourth? I would still have a C- 
average. Or is there some kind of critical cutoff; an F is 
unacceptable even with three A’s. Averages have their 
limitations. How would you evaluate a surgeon who had 
all A’s except for a D in eye-hand coordination? Specify- 
ing the standards is what is most lacking in faculty 
evaluation systems, probably because they are the most 
difficult to agree upon. Done right, the task will require 
several iterations over years, but without some kind of 
definition of criteria and standards, faculty evaluation is 
not only subjective, but often arbitrary and capricious. 

1 5. Train the evaluators to evaluate. This is frequently 
recommended: infrequently done. What the literature is 
recommending is that everyone who supplies data to be 
used in evaluation receive some kind of tiaining. So, for 
example, instructors could discuss the meaning of student 
rating items with students. Peers rating course mat€ '’als 
could practice by rating case studies, first independei <y, 
then discussing them in groups. Similarly, administra- 
tors — or others — could view videotapes of classes and 
rate them as they would for a classroom observation. Or 
complete portfolios could be evaluated by anyone who 
would have that responsibility. 
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16. 1 Train the supervisors in giving feedback. Role 
playing the annual performance appraisal is perhaps the 
best way to provide such training. An administrator — 
evaluator — may feel very confident of the accuracy of his 
or her judgment about a faculty member’s performance. 

But try communicating that judgment to the faculty 
member in a constructive way. The role playing of case 
studies can be very educational (even if unpleasant). 

One approach is to have the supervisors split into groups 
of three. One person plays the supervisor, another the 
faculty member, and the third observes. After the role 
play ends, the observer gives feedback to the supervisor 
and everyone talks about their reactions to the experi- 
ence. The training requires three iterations so that 
everyone experiences all three of the roles. 

17. Maintain appropr/afe confidentiality. On most 
campuses the faculty handbook, or other statement of 
institutional policy, indicates that deliberations concerning 
personnel decisions are to be kept confidential (although 
occasionally state “sunshine" laws will include personnel 
decisions). Faculty and administrators should take such 
confidentiality as a very serious professional and ethical 
obligation. However, despite institutional policies requir- 
ing confidentiality, everyone should be aware that if the 
faculty member can make a case that there has been 
discrimination, the courts may require disclosure. (See 
the U. S. Supreme Court decision. University of Pennsyl- 
vania V. EEOC, 110 S. Ct. 577 (1990).) 

18. Reward effecf/ve performance. For a faculty 
evaluation system to be effective, i.e., to impact faculty 
behavior, first, accurate discriminations must be made 
about the performance of different faculty members; 
second, the faculty must perceive that the discriminations 
are accurate; and third, based on those discriminations 
effective faculty must be treated differently from ineffec- 
tive ones. On campuses with across the board raises— 
which are basically pass/fail systems — ^the third condition 
is usually lacking. Why bother to make fine discrimina- 
tions if you are only going to put people into two catego- 
ries, or on many campuses really only one — because 
everyone usually passes. 

19. Combine development with evaluation; have an 
omcampus consultant. If an institution goes to the 
trouble of collecting accu/ate information about a faculty 
member’s performance, why not use some of it to help the 



individual improve. Although the kind of data needed for 
evaluation differs some from that needed for improve- 
ment, there can be considerable overlap. Institutions say 
they want to help faculty improve but often have little 
systematic help available for anyone who wants to 
improve. The ideal situation for development is lo have 
someone from the faculty with assigned responsibility to 
help faculty improve. This does not require a large center 
or office; releasing a faculty member— whom the other 
faculty trust— from one course a term, or from part of their 
research obligation, etc., plus a modest budget, is a 
useful beginning. Then let experience determine the rate 
of growth. 

20. Review the system periodically. Nothing con- 
ceived by human beings will ever be perfect, especially 
something as complex and sensitive as faculty evaluation. 
Initially, if you are making major changes in your evalua- 
tion system, you should review it every year. Eventually 
you need only review the system every three to five years. 
The system should be viewed as organic and dynamic. It 
will need to grow and change, if only because circum- 
stances change, but more importantly to become better. 

Conclusion. As you have probably already concluded, 
developing an effective faculty evaluation system is time 
consuming. This cuts both ways. Occasionally I will hear 
of a campus where the beard of trustees has given the 
institution three or six months to make major changes in 
their evaluation system, e.g., moving to a merit-pay 
system or changing to universal use of teaching portfolios. 
To change that quickly almost guarantees a pror result. 
The process is not just a cognitive one, changing ideas; it 
is a normative-reeducative one, changing values and 
attitudes. For a system to be effective — to really change 
faculty behavior — it needs to be accepted by the faculty. 

It must be owned by them. Acceptance and ownership 
require a lot of time consuming discussion, but hopefully 
you are building a system of some permanence, not just 
something to use until the next change in administration. 
On the other hand, if you have a reasonably effective 
evaluation system in place and the board gives you 
several months to make it explicit, start now because it 
will take two or three times longer than you plan. 
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